sf package.Recommended textbooks:
Other sources:
sf, sp tend to be well-documented, and with worked examples.EPSG:5513)..shp), geodatabase (.gdb), geopackage (.gpk), geoJSON (.geojson)..shp) format. The portal allows downloading the maps at different resolutions. In this guide, we will be working with the most precise one, which is referred to as ‘Základná úroveň/ ZBGIS - Administratívne hranice’.sf packageInstall the package using:
st_read() to load mapsst_read(dsn, layer)
dsn: file name of the map file.layer: certain file formats allow saving multiple layers, we will work with single layer files (it does not mean we can’t have multiple indicators/statistic for each location).Load the map of Slovakia at the municipality level:
To see what coordinate system it use st_crs() function:
## Coordinate Reference System:
## No EPSG code
## proj4string: "+proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813975277778 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +units=m +no_defs"
Due to the missing code, we do a transformation to the official code using st_transform() function.
## DOW AUT ACH SOI FACC
## Min. :2019-04-30 Min. :1 Min. :1 Min. :9 FA004:2927
## 1st Qu.:2019-04-30 1st Qu.:1 1st Qu.:1 1st Qu.:9
## Median :2019-04-30 Median :1 Median :1 Median :9
## Mean :2019-04-30 Mean :1 Mean :1 Mean :9
## 3rd Qu.:2019-04-30 3rd Qu.:1 3rd Qu.:1 3rd Qu.:9
## Max. :2019-04-30 Max. :1 Max. :1 Max. :9
##
## IDN4 NM4 IDN3 NM3
## Min. :500011 Lúčka : 4 Min. :101.0 Košice - okolie: 114
## 1st Qu.:508989 Porúbka : 4 1st Qu.:402.0 Rimavská Sobota: 107
## Median :518204 Dúbrava : 3 Median :606.0 Prešov : 91
## Mean :521206 Lúčky : 3 Mean :547.7 Levice : 89
## 3rd Qu.:526138 Petrovce: 3 3rd Qu.:708.0 Bardejov : 86
## Max. :599972 Píla : 3 Max. :811.0 Trebišov : 82
## (Other) :2907 (Other) :2358
## IDN2 NM2 VYMERA
## Min. :1.000 Prešovský :665 Min. : 361255
## 1st Qu.:4.000 Banskobystrický:516 1st Qu.: 7207245
## Median :6.000 Košický :461 Median : 11655767
## Mean :5.415 Nitriansky :354 Mean : 16752332
## 3rd Qu.:7.000 Žilinský :315 3rd Qu.: 19769322
## Max. :8.000 Trenčiansky :276 Max. :359784254
## (Other) :340
## Shape_Leng Shape_Area geometry
## Min. : 3211 Min. : 354208 MULTIPOLYGON :2927
## 1st Qu.: 14031 1st Qu.: 7207035 epsg:5513 : 0
## Median : 18582 Median : 11647728 +proj=krov...: 0
## Mean : 21367 Mean : 16747701
## 3rd Qu.: 25503 3rd Qu.: 19794868
## Max. :204284 Max. :359936992
##
From this, it is obvious that:
IDN4 is a unique identifier for each municipality.NM4 is the name of the municipality.IDN3 is the identifier for okres.NM3 is the name for okres.IDN2 is the identifier for kraj.NM2 is the name for kraj.Shape_Leng and Shape_Area are the perimeter and the area of the municipality.VYMERA is given that the values slightly deviate from the Shape_Area.The other columns are not relevant for this tutorial.
In the visualisation section, we will stick to these values even though they are not particularly interesting. In later sections we will add more interesting data.
Before, we proceed, we rename the columns to more meaningful names:
plot() methodplot() methodThis is a good place to start when doing exploratory analysis, but it’s better to use more sophisticated functions to produce better-looking plots.
ggplotggplot can also be used in conjunction with spatial objects.
ggplottmapThis is a very versatile library that allows integration with Open Street Map data and interactive maps on top of regular plotting. What we show here is only scratching the surface, for more examples, please visit the tmap’s github page.
Before we proceed, we install the libraries and load them.
tmap## tmap mode set to plotting
tmaptmapFor example, we can do a breakdown by kraj.
tmaptmapFor plots with fewer components (e.g. kraj), we might want to add labels. Be careful with the overlaps of the labels and overflows outside the frame. This post gives a good solution for the latter.
tm_shape(mapa.kraj.sf) +
tm_fill(col = "plocha_uzemia", style = "cont", title="Plocha m\u00B2") +
tm_style('gray', title = 'Okresy SR') +
tm_text('kraj', auto.placement = TRUE)To save the map, use tmap_save() function.
tmapmapa.okres.sf.gps <- st_transform(x = mapa.okres.sf, crs=4326)
osm_tiles = tmaptools::read_osm(st_bbox(mapa.okres.sf.gps), type = "osm")
tmap.object <- qtm(osm_tiles, raster.alpha = 0.6) +
tm_shape(mapa.okres.sf.gps) +
tm_fill(col = "Shape_Area", style = "cont", title="Plocha m\u00B2") +
tm_style('gray', title = 'Okresy SR')Load the data:
library(readxl)
results.obec.file.path <- file.path(getwd(), 'data', "raw",
"PRE_2019_KOLO1_xlsx",
"PRE_2019_KOLO1_tab03e.xlsx")
results.obec <- read_excel(results.obec.file.path)
results.obec <- results.obec[-(1:4), ]
names(results.obec) <- c('kraj_id', 'kraj', 'uzemny_obvod_id', 'uzemny_obvod',
'okres_id', 'okres', 'obec_id', 'obec',
'candidate_id', 'candidate_fname', 'candidate_sname',
'platne_hlasy_count', 'platne_hlasy_percent',
'pullouts')
results.obec <- results.obec %>%
dplyr::mutate_at(c("platne_hlasy_count", "platne_hlasy_percent"), as.numeric)Join the data with the map:
Since we are working with ‘obec’ level, we focus only on one ‘kraj’ region, and we only compare top 4 candidates in that region.
zilinsky.kraj.results.sf <- results.obec.sf %>%
dplyr::filter(kraj_id == 5) # zilinsky kraj
# select top 4 candidates in Zilinsky kraj
zilinsky.kraj.top.candidates <- zilinsky.kraj.results.sf %>%
dplyr::group_by(name) %>%
dplyr::summarise(vote_total = sum(platne_hlasy_count)) %>%
dplyr::arrange(vote_total) %>%
dplyr::top_n(4, vote_total) %>%
dplyr::pull(name)
# filter out only the 4 candidates
zilinsky.kraj.results.top.candidates.sf <- zilinsky.kraj.results.sf %>%
dplyr::filter(name %in% zilinsky.kraj.top.candidates)As we can see this visualisation is not ideal, so we go for the interactive map where we can zoom in/out as we please. We only focus on Zuzana Caputova and Maros Sefcovic.
library(tidyr)
zilinsky.kraj.results.top.2.sf <- zilinsky.kraj.results.top.candidates.sf %>%
dplyr::filter(name %in% c('Maroš Šefčovič', 'Zuzana Čaputová'))
zilinsky.results.wide.sf <- zilinsky.kraj.results.top.2.sf %>%
dplyr::select(name, platne_hlasy_percent, obec) %>%
tidyr::spread(name, platne_hlasy_percent)
tmap_mode("view")
tm <- tm_shape(zilinsky.results.wide.sf) +
tm_polygons(c('Maroš Šefčovič', 'Zuzana Čaputová')) +
tm_facets(sync = TRUE, ncol = 2)
tm